Model Selection

Multimodal visual understanding

# Multimodal visual understanding

Wr30a Deep 7B 0711 I1 GGUF

This is a quantized version of the prithivMLmods/WR30a-Deep-7B-0711 model, supporting multiple languages and suitable for various tasks such as text generation and image caption generation.

Transformers Supports Multiple Languages

Gemma 3 12b It Quantized.w8a8

An INT8 quantized version based on google/gemma-3-12b-it, supporting visual text input and text output, suitable for efficient inference deployment

Qwen2.5 VL 72B Instruct GGUF

Qwen2.5-VL-72B-Instruct is the latest vision - language model in the Qwen family, with powerful visual understanding and video analysis capabilities, suitable for multiple fields such as finance and business.

Transformers English

Qwen2.5 VL 32B Instruct Exl2 4 25bpw

Qwen2.5-VL-32B-Instruct is the latest vision - language model in the Qwen family, with powerful multimodal understanding and generation capabilities, supporting the interaction of images, videos, and text.

Transformers English

christopherthompson81

Amoral Gemma3 12B Vision

Vision-enhanced version based on soob3123/amoral-gemma3-12B, combining Gemma3-12B large language model with visual encoder for multimodal tasks

Transformers English

Qwen2 VL 72B Instruct GGUF

The GGUF quantized version of Qwen2-VL-72B-Instruct, supporting multimodal image-text to text conversion, which can be run through LlamaEdge.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase